Skip to content

Conversation

@cpretzer
Copy link
Contributor

@cpretzer cpretzer commented Jul 9, 2019

This relates to linkerd2 #2970.

The individual reporting the issue found this output in his logs:

Another app is currently holding the xtables lock. Perhaps you want to use the -w option? Aborting firewall configuration

This pull request addresses that by adding a flag which will allow a user to specify the iptables -w switch.

cpretzer added 2 commits July 8, 2019 18:53
Signed-off-by: Charles Pretzer <charles@buoyant.io>
Signed-off-by: Charles Pretzer <charles@buoyant.io>
@cpretzer cpretzer requested a review from grampelberg July 9, 2019 03:44
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change @cpretzer 👍 Should get us closer to better CNI support 🥇

I was about to suggest we add the ability to actually set the timeout in seconds, but it's probably better to leave it as is; if I understand correctly that means it'll block forever, which will make the DaemonSet fail on setup, which is what we want if the lock can't be acquired.

You might already be on it, but I just wanted to confirm this needs to be paired with a change here
https://github.com/linkerd/linkerd2/blob/master/cni-plugin/main.go#L195-L204
setting that new flag. We probably want to always set it to true over there. If not, that change would need to be paired with a new flag for the linkerd install-cni subcommand.

@grampelberg
Copy link
Contributor

I’m still a little uncomfortable with the implications here, it feels like we might be papering over a more serious problem. What does everyone else do?

@cpretzer
Copy link
Contributor Author

cpretzer commented Jul 9, 2019

@alpeb Thanks for the feedback. The cni-plugin code that you reference is exactly what I needed for the next step of this change.

I agree that using a timeout is a more robust solution. That implementation requires a deeper conversation that answers questions like:

  • What happens if the lock never becomes available?
    -- We can log a warning that the rule can't be added and this will affect the traffic to the pods. This should should be displayed in the dashboard and/or tap output.
  • Will iptables allow us to find the process that owns the xtables lock? This will help for diagnosing deadlock issues

There are more things to think through, and this is just a start

@cpretzer
Copy link
Contributor Author

cpretzer commented Jul 9, 2019

@grampelberg

I agree with you that this may be an exercise in can kicking.

There is an issue and a couple of pull requests in the kubernetes repos that address this:
kubernetes/kubernetes#7370
https://github.com/kubernetes/kubernetes/pull/55945/files
https://github.com/kubernetes/kubernetes/pull/29135/files

thockin wrote a retry mechanism:
https://github.com/kubernetes/kubernetes/pull/8264/files

The goal with this first pass is to give the submitter of linkerd/linkerd2#2970 something to try.

As we develop a more robust solution, we also need to consider implementing the cmdDel function. My spidey sense tingles when I consider that the description of the problem includes a large cluster with lots of containers. I can see a case where many entries are added to iptables and if they are not cleaned up, then the process of adding more entries takes longer, thereby introducing this lock condition.

Or, as you've mentioned, it could be a deadlock coming from somewhere else. Either way, I hope that this first pass will help to understand the root cause.

@cpretzer cpretzer merged commit 8d2e800 into master Jul 12, 2019
stevej pushed a commit that referenced this pull request Dec 2, 2022
Signed-off-by: Steve Jenson <stevej@buoyant.io>
stevej pushed a commit that referenced this pull request Dec 29, 2022
* modifying import paths and making a temporary copy of testutil/annotations.go

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* removed testutil, dockerized cni installer tests now pass

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* moving internal to pkg/linkerd-, removing Dockerfile until fixed, changining imports, removing linkerd2 k8s client with client-go

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* gofmt install-cni_test.go

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* go mod updates

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* adding pkg to Docker image

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* updating dev from v32 to v35 for go

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* moving back to old dev image

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* use dev:v32-go for go lint workflow

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* fixing linter complaints

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* fixing linter complaints

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* turning off noisy lint #1

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* turning off noisy lint #2

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* turning off noisy lint #3

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* turning off noisy lint #4

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* turning off noisy lint #5

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* turning off noisy lint #6

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* Replace pkg/ with internal/ (#148)

* Replace pkg/ with internal/

There's no need for a public library export. We can share code within
this repo via the `internal` directory.

* simplify package names

Signed-off-by: Oliver Gould <ver@buoyant.io>

* adding internal back. whoopsie

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* bumping dev go version

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* replace deprecated ioutil functions with io functions.

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* increasing timeout to help with linter issues, adding verbose

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* replace TODO with literals, wait for the linter to complain so we can give it the magic incantation to sleep now

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* more linter

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* gofmt

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* swap position of comment and argument as the linter has an opinion here, too

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* Update cni-plugin/main.go

Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>

* Update cni-plugin/main.go

Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>

* Update cni-plugin/main.go

Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>

* Update cni-plugin/main.go

Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>

* simplify lint call

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* removed unneeded abstraction

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* linter for cni-plugin and all go code

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* giving flags to go linter

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* run the test on the moved internal package

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* adding keys back for annotation lookup

Signed-off-by: Steve Jenson <stevej@buoyant.io>

Signed-off-by: Steve Jenson <stevej@buoyant.io>
Signed-off-by: Oliver Gould <ver@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
stevej pushed a commit that referenced this pull request Jan 18, 2023
* modifying import paths and making a temporary copy of testutil/annotations.go

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* removed testutil, dockerized cni installer tests now pass

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* moving internal to pkg/linkerd-, removing Dockerfile until fixed, changining imports, removing linkerd2 k8s client with client-go

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* gofmt install-cni_test.go

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* go mod updates

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* adding pkg to Docker image

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* updating dev from v32 to v35 for go

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* moving back to old dev image

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* use dev:v32-go for go lint workflow

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* fixing linter complaints

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* fixing linter complaints

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* turning off noisy lint #1

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* turning off noisy lint #2

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* turning off noisy lint #3

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* turning off noisy lint #4

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* turning off noisy lint #5

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* turning off noisy lint #6

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* adding in Dockerfile, just rules for building, and a workflow for testing the cni-plugin installer script

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* remember to setup docker

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* remember to setup docker-qemu

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* where is docker?

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* back to a named ubuntu version, removing devcontainer

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* we need just

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* WIP import of CNI plugin integration test environment. does not run due to image pull errors.

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* rewriting just rules to match new rules

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* bumping dev version, renaming smoke test

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* WIP for running smoke tests

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* go workflow fix

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* also rid ourselves of ioutil in this branch

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* adding a second, passing test

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* let's run the test in CI

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* name the test properly for CI to run it

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* made the installer integration tests more in-line with the other integration tests

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* cni-plugin integration test workflow

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* breaking up steps, renaming test

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* just

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* bringing changes from linkerd2 over

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* tests running within cni-plugin context

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* create service account and don't delete so we can inspect

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* fix namespaces, use matei's k3d/k3s mountPaths in the hopes that CNI will run in our pod.

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* WIP for debugging why CNI won't run in my own pods despite everything looking normal

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* adding whitespace, fixing comment, removing unneeded variable

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* fixing some small things

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* improving Dockerfile, going back to edge for linkerd-cni

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* cleaned up Dockerfile

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* using --link for 50% size improvement

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* cleanup unusued functions, run network-validator before test suite

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* remove qemu setup, add comment about log level

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* add wiring to see cni-net-dir and check for kubeconfig

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* checking that linkerd-cni is the last plugin in the conflist

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* renaming smoke_test to flannel

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* rename files, update justfile

* name a test file _test so the test runner will run my test

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* renaming to flannel

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* remove hardcoded filename

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* clarified comment

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* fixed merge conflict error

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* fix log levels

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* fix a log level

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* run test on all files in ./cni-plugin

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* hcomment explaining why there's no ENTRYPOINT

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* use a map instead of an array for simplicity

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* abstract which integration test subdirectory gets used, add internal to ensure those packages are tested

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* go.yml is already running these tests are there no integration tests in there to run

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* breaking up a line

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* renaming SUBDIRECTORY to SCENARIO and renaming a run just target to flannel to signify that this is the rule to crib for other scenarios

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* better error handling of the cleanup() function, print more diagnostic information if linkerd-cni rollout fails

Signed-off-by: Steve Jenson <stevej@buoyant.io>

* add error handling for describe ds and logs

Signed-off-by: Steve Jenson <stevej@buoyant.io>

Signed-off-by: Steve Jenson <stevej@buoyant.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants